Objectives:


library(dplyr)
library(sf)
library(ggplot2)
library(osmdata)
library(leaflet)
library(nominatimlite)

Part 1. Import vector data from Open Street Map

What is Open Street Map?

Open Street Map (OSM) is a collaborative project which aims at mapping the world and sharing geospatial data in an open way. Anyone can contribute, by mapping geographical objects their encounter, by adding topical information on existing map objects (their name, function, capacity, etc.), or by mapping buildings and roads from satellite imagery (cf. HOT: Humanitarian OpenStreetMap Team).

This information is then validated by other users and eventually added to the common “map” or information system. This ensures that the information is accessible, open, verified, accurate and up-to-date.

The result looks like this: View of OSM web interface The geospatial data underlying this interface is made of geometrical objects (i.e. points, lines, polygons) and their associated tags (#building #height, #road #secondary #90kph, etc.).

How to extract geospatial data from Open Street Map?

Bounding-box

The first thing to do is to define the area within which you want to retrieve data, aka the bounding box. This can be defined easily using a place name and the package nominatimlite to access the free Nominatim API provided by OpenStreetMap.

We are going to look at Brielle together, but you can also work with the small cities of Naarden, Geertruidenberg, Gorinchem, Enkhuizen or Dokkum.

/! This might not be needed, but ensures that your machine knows it has internet…

assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
nominatim_polygon <- nominatimlite::geo_lite_sf(address = "Brielle", points_only = FALSE)
bb <- sf::st_bbox(nominatim_polygon)
bb
     xmin      ymin      xmax      ymax 
 4.135294 51.883500  4.229222 51.931410 
  • Word of caution

There might multiple responses from the API query, corresponding to different objects at the same location, or different objects at different locations. For example: Brielle (Netherlands) and Brielle (New Jersey)

Brielle, Netherlands Brielle, New Jersey

We should therefore try to be as unambiguous as possible by adding a country code or district name.

nominatim_polygon <- nominatimlite::geo_lite_sf(address = "Brielle, NL", points_only = FALSE)
bb <- sf::st_bbox(nominatim_polygon)
bb
     xmin      ymin      xmax      ymax 
 4.135294 51.883500  4.229222 51.931410 

Extracting features

A feature in the OSM language is a category or tag of a geospatial object. Features are described by general keys (e.g. “building”, “boundary”, “landuse”, “highway”), themselves decomposed into sub-categories (values) such as “farm”, “hotel” or “house” for buildings, “motorway”, “secondary” and “residential” for highway. This determines how they are represented on the map.

Searching documentation

Let’s say we want to download data from OpenStreetMap and we know there is a package for it named osmdata, but we don’t know which function to use and what arguments are needed. Where should we start?

Let’s check the documentation online:

The OSMdata Documentation page

It appears that there is a function to extract features, using the Overpass API. This function’s name is opq (for OverPassQuery) which, in combination with add_osm_feature, seems to do the job. However it might not be crystal clear how to apply it to our case. Let’s click on the function name to know more.

The Overpass Query Documentation page

On this page we can read about the arguments needed for each function: a bounding box for opq() and some key and value for add_osm_feature(). Thanks to the examples provided, we can assume that these keys and values correspond to different levels of tags from the OSM classification. In our case, we will keep it at the first level of classification, with “buildings” as key, and no value. We also see from the examples that another function is needed when working with the sf package: osmdata_sf(). This ensures that the type of object is suited for sf. With these tips and examples, we can write our feature extraction function as follows:

x <- opq(bbox = bb) %>%
   add_osm_feature(key = 'building') %>%
    osmdata_sf()

What is this x object made of? It is a table of all the buildings contained in the bounding box, which gives us their OSM id, their geometry and a range of attributes, such as their name, building material, building date, etc. The completion level of this table depends on user contributions and open resources (here for instance: BAG, different in other countries).

head(x$osm_polygons)

Mapping attributes

For instance: the building age focusing on post-1900 buildings.

Projections

First, we are going to select the polygons and reproject them with the Amersfoort/RD New projection, suited for maps centred on the Netherlands. This code for this projection is: 28992.

buildings <- x$osm_polygons %>%
  st_transform(.,crs=28992)

Mapping

Then we create a variable which a threshold at 1900. Every date prior to 1900 will be recoded 1900, so that buildings older than 1900 will be represented with the same shade.

Then we use the ggplot function to visualise the buildings by age. The specific function to represent information as a map is geom_sf(). The rest works like other graphs and visualisation, with aes() for the aesthetics.

buildings$build_date <- as.numeric(
  if_else(
    as.numeric(buildings$start_date) < 1900, 
          1900, 
          as.numeric(buildings$start_date)
          )
  )

 ggplot(data = buildings) +
   geom_sf(aes(fill = build_date, colour=build_date))  +
   scale_fill_viridis_c(option = "viridis")+
   scale_colour_viridis_c(option = "viridis")

So this reveals the historical centre of [city] and the various extensions. Anything odd? what? around the centre? Why these limits / isolated points?

Challenge: import an interactive basemap layer under the buildings with Leaflet (20 min)

  • Check out the leaflet package documentation
  • Plot a basemap in Leaflet and try different tiles. Basemap documentation
  • Transform the buildings into WGS84 projection and add them to the basemap layer with the addPolygons function.
  • Have the fillColor of these polygons represent the build_date variable. Choropleth documentation. Using the example and replace the variable names where needed.

Solution

Part 2. Basic GIS operations with the sf package

Why sf?

sf is a package which supports simple features (sf), “a standardized way to encode spatial vector data.”. It contains a large set of functions to achieve all the operations on vector spatial data for which you might use traditional GIS software: change the coordinate system, join layers, intersect or unit polygons, create buffers and centroids, etc. cf. the sf cheatsheet.

Conservation in Brielle, NL

Let’s focus on old buildings and imagine we’re in charge of their conservation. We want to know how much of the city would be affected by a non-construction zone of 100m around pre-1800 buildings.

Let’s select them and see where they are.

old <- 1800 # year prior to which you consider a building old

summary(buildings$start_date)
   Length     Class      Mode 
    10608 character character 
buildings$start_date <- as.numeric(as.character(buildings$start_date))

old_buildings <- buildings %>%
  filter(start_date <= old)

 ggplot(data = old_buildings) + geom_sf(colour="red")

As conservationists, we want to create a zone around historical buildings where building regulation will have special restrictions to preserve historical buildings.

Buffer

Let’s say this zone should be 100 meters. In GIS terms, we want to create a buffer around polygons. The corresponding function sf is st_buffer, with 2 arguments: the polygons around which to create buffers, and the radius of the buffer.

 distance <- 100 # in meters 
 
buffer_old_buildings <- 
  st_buffer(x = old_buildings, dist = distance)
 
 ggplot(data = buffer_old_buildings) + geom_sf()

Union

Now, we have a lot of overlapping buffers. We would rather create a unique conservation zone rather than overlapping ones in that case. So we have to fuse the overlapping buffers into one polygon. This operation is called union and the corresponding function is st_union.

 single_old_buffer <- st_union(buffer_old_buildings) %>%
   st_cast(to = "POLYGON") %>%
  st_as_sf() 

single_old_buffer<- single_old_buffer %>%
  mutate("ID"=as.factor(1:nrow(single_old_buffer))) %>%
  st_transform(.,crs=28992) 

We also use st_cast() to explicit the type of the resulting object (POLYGON instead of the default MULTIPOLYGON), st_as_sf() to transform the polygon into an sf object.

Centroids

For the sake of visualisation speed, we would like to represent buildings by a single point (for instance: their geometric centre) rather than their actual footprint. This operation means defining their centroid and the corresponding function is st_centroid.

sf::sf_use_s2(FALSE)
centroids_old <- st_centroid(old_buildings) %>%
  st_transform(.,crs=28992)  

ggplot() + 
    geom_sf(data = single_old_buffer, aes(fill=ID)) +
    geom_sf(data = centroids_old)

Intersection

Now what we would like to distinguish conservation areas based on the number of historic buildings they contain. In GIS terms, we would like to know how many centroids each fused buffer polygon contains. This operation means intersecting the layer of polygons with the layer of points and the corresponding function is st_intersection.

 centroids_buffers <- 
  st_intersection(centroids_old,single_old_buffer) %>%
   mutate(n=1)

 centroid_by_buffer <- centroids_buffers %>%
   group_by(ID) %>%
   summarise(n = sum(n))
 single_buffer <- single_old_buffer %>%
   mutate(n_buildings = centroid_by_buffer$n)

  ggplot() + 
   geom_sf(data = single_buffer, aes(fill=n_buildings)) +
   scale_fill_viridis_c(alpha = 0.8,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B")

Final output:

Let’s map this layer over the initial map of individual buildings.

ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Exercise (10 min)

Challenge: Conservation rules have changed. The historical threshold now applies to all pre-war buildings, but the distance to these building is reduced to 10m. Can you map the number of all buildings per 10m fused buffer?


Solution

 old <- 1939 
 distance <- 10
 #select
 old_buildings <- buildings %>%
   filter(start_date <= old)
 #buffer
 buffer_old_buildings <- st_buffer(old_buildings, dist = distance)
  #union
 single_old_buffer <- st_union(buffer_old_buildings) %>%
   st_cast(to = "POLYGON") %>%
   st_as_sf()  
 
 single_old_buffer <- single_old_buffer %>%
   mutate("ID"=1:nrow(single_old_buffer))  %>%
   st_transform(single_old_buffer,crs=4326) 
 #centroids
 centroids_old <- st_centroid(old_buildings) %>%
   st_transform(.,crs=4326)  
  #intersection
  centroids_buffers <- st_intersection(centroids_old,single_old_buffer) %>%
   mutate(n=1)
 centroid_by_buffer <- centroids_buffers %>% 
   group_by(ID) %>%
   summarise(
   n = sum(n)
   )
 single_buffer <- single_old_buffer %>% 
   mutate(n_buildings = centroid_by_buffer$n)
  ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Problem: there are many pre-war buildings and the buffers are large so the number of old buildings is not very meaningful. Let’s compute the density of old buildings per buffer zone.

Area

single_buffer$area <- st_area(single_buffer, )  %>% units::set_units(., km^2)

single_buffer$old_buildings_per_km2 <- as.numeric(single_buffer$n_buildings / single_buffer$area)

 ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=old_buildings_per_km2), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Summary and keypoints

We have seen how OpenStreetMap (OSM) geodata works and how to import, select, and visualise OSM vector data. We have seen how to create spatial buffers and centroids, how to intersect vector data and how retrieve the area of polygons.

In short: